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This paper presents an algorithm for localization and mapping for a mobile robot 
using monocular vision and odometry as its means of sensing. The approach 
uses the Variable State Dimension filtering (VSDF) framework to combine as- 
pects of Extended Kalman filtering and nonlinear batch optimization. This 
paper describes two primary improvements to the VSDF. The first is to use an 
interpolation scheme based on Gaussian quadrature to linearize measurements 
rather than relying on analytic Jacobians. The second is to replace the inverse 
covariance matrix in the VSDF with its Cholesky factor to improve the compu- 
tational complexity. Results of applying the filter to the problem of localization 
and mapping with omnidirectional vision are presented. 

1 This work was supported in part by the National Aeronautics and Space Administration 
(NASA) under Cooperative Agreement MCC 2-1006 with the Universities Space Research 
Association (USRA). 



1 Introduction 


It may be required for a robot to enter an unknown environment and to con- 
currently explore the area and produce a map while maintaining an accurate 
estimate of its position. If the robot were to have an a priori map, then lo- 
calization with respect to the known map would be a relatively easy task. Al- 
ternatively, if the robot were to have a precise, externally referenced position 
estimate, then mapping would be a relatively easy task. However, problems 
in which the robot has no a priori map and no external position reference are 
particularly challenging. Such scenarios may arise for underwater robots, min- 
ing vehicles, planetary surfaces, or anywhere that maps are not available. This 
problem has been referred to as concurrent localization and mapping (CLM) and 
simultaneous localization and mapping (SLAM). We will use the latter in this 
paper. 

In the work presented here, we model the robot environment as a 2-D planar 
world, so that the rover pose at time i is the 3-dof parameter vector m; including 
position on the 2-D plane and orientation. Landmarks are assumed to be point 
features and the position of landmark j is the 2-dof parameter vector xj . The 
means of sensing considered in this work are an odometry measurement 

di = /(mi, mi i) + w ; (1) 

which measures the change in vehicle pose from i — 1 to i, and a bearing mea- 
surement 

b k =g( m i(k) ,x j(k) ) (2 ) 

which measures the bearing from the rover position m;( k ) to a landmark at 
position xj( k ) , where i(k) and j(k) indicate which pose and landmark correspond 
to bearing measurement k. Both Wi and Vh are modelled as i.i.d. Gaussian noise 
processes. For notational convenience, we denote the parameter vector including 
all unknowns as 6 = {mi , m 2 , . . . , xi , X 2 , - . . , xn } and the measurement vector 
z = {di, c? 2 , • • • , di, bi, 62 , ■ ■ ■ , bk}- The generating function for all measurements 
as a function of all parameters is 


z = h(0)+v, (3) 

where v ~ Af(0, R) is a normally distributed random variable and following 
the assumption of i.i.d. noise above, R is diagonal. The generating function 
for a single odometry or bearing measurement will be denoted z k = hk(6) + Vk 
where, with a slight change of notation from above, z k can be either type of 
measurement. 

The bearing only sensor model we use here is motivated by the use of monoc- 
ular vision, which is fairly cheap, small, robust, low weight and low power com- 
pared to active range-bearing sensors. Odometry is known to provide poor 
egomotion data but the goal is to see what can be done with these two simple 
sensing modalities alone. The incorporation of inertial measurement, external 
position references, and range sensors can only improve the end results. Further- 
more, other models of landmarks and map parameterizations are possible but 
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are not considered in this work. Finally, the method described here is extend- 
able to the full 3-D problem, although admittedly the problem is more complex 
and may require more sophisticated parameterization. 


2 Previous Work 

There are two primary sources of literature related to the problem considered 
here. Bearings-only localization and mapping is similar to the SLAM problem in 
robotics and to the Structure from Motion (SFM) problem in computer vision. 

Most of the SLAM literature in robotics explores the problem of sensor fusion 
for onboard egomotion sensing and range-bearing sensors such as radar, sonar, 
and lidar. Approaches such as iterated closest point (ICP) [1], Expectation- 
Maximization [2], and correlation [3] have been explored for the task. The 
predominant body of SLAM work uses Extended Kalman filtering (EKF) based 
approaches [4, 5, 6, 7] or related approaches such as Unscented Filter [8] or 
Covariance Intersection [9]. 

The photogrammetry and computer vision literature contain a significant 
amount of work related to the structure from motion (SfM) problem, in which 
monocular images alone are used to reconstruct the scene and recover the cam- 
era motion. Among the popular approaches are factorization [10], sequential 
multiframe geometric constraints [11], and nonlinear bundle adjustment [12]. 

The Variable State Dimension filter [13] is a combination of Extended Kalman 
filtering and nonlinear optimization. The filter was developed to be a recursive 
algorithm for Structure from Motion, and it has some of the characteristics of 
bundle adjustment and Kalman smoothing. The VSDF provides the foundation 
for the work in this paper. 

3 The VSDF Algorithm 

The variable state dimension filter (VSDF) combines aspects of the EKF with 
aspects of Gauss-Newton nonlinear optimization[14]. Since z k is a Gaussian 
random variable with mean h k ($) and variance R k , we can write the likelihood 
for z given 8 

p{z\8) <x e~ E,(^- h ^G) T R k - 1 ( Zk -h k (G) ( 4 ) 

Gauss-Newton optimization searches for the parameter which minimizes the 
negative log of the likelihood 

e = —log(p(z\8)) 

= ^(z k - h k (0)) T R k _1 (z k - h k (0)) (5) 

k 

In order to minimize this cost function, the algorithm starts with an estimate 
of the state vector 0o and computes 

a = ^ — H k T R k x (z k — h k (0)) (6) 

k 
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( 7 ) 


A = ^H k T R k x H k 

k 


where H k = 


9h k 

ae 


o 0 


is the measurement Jacobian, a = Ve is the gradient of (5) 


and A » V 2 e is an approximation to the Hessian[14]. The algorithm computes 
an update to the state estimate by solving the linear system 


A5 = a (8) 

and updating the parameter vector 

9 <-6-5 (9) 

Equations (6) through (9) are iterated to convergence, . Solutions found using 
Gauss-Newton are optimal in a least squares sense, which is also maximum 
likelihood for Gaussian noise. However, the vector 9 contains the entire map 
and the entire vehicle trajectory, which makes Gauss-Newton slow for large 
datasets. 

The VSDF provides a method for linearizing measurements, incorporating 
them into a Gaussian “prior” . The filter equations may be derived by linearizing 
terms on the right hand side of (6) and (7). Suppose we wish to replace the 
term involving z k in (5). We can compute a linear approximation to h k () 

h k (0)«h k (0 o )+H k (fl-flo) (10) 

and in order to minimize the new cost function we simply replace the corre- 
sponding term in (6) and (7) with the same linearization. 

In the VSDF, terms are replaced with a linearization of the form 

(9 - 9 0 ) T A k (9 - 9 0 ) » 

(z k - H k (0 - 9 0 )) T R k 1 (z k - H k (tf - ff 0 )) (11) 

where A k = is the contribution of measurement k to the Hessian, 

and 

a k = H k T J?, 1 (z k -h k (0 o )) (12) 

is the constant contribution of measurement k to the gradient (6). In the original 
VSDF the linearization Hi is again taken to be the Jacobian of the measurement 
function evaluated at the state estimate. 

This is similar to the EKF, except that the EKF linearizes each measure- 
ment immediately upon incorporation into the state estimate. The VSDF opens 
the possibility of linearizing the term at some later time[13]. The advantage in 
linearizing the measurement later is that the point of expansion for the lineariza- 
tion is extimated using more data, and therefore has smaller variance. We can 
expect the linearization to occur at a more accurately estimated point, as shown 
in Figure 1. 


5 




Figure 1: Advantage of leaving nonlinear measurements in the filter (a) The 
estimate of the state x at time i given information up to time i leaves a large 
region of uncertainty (shaded), (b) After processing later observations, the 
VSDF has an estimate with lower variance, increasing the chances that our 
linearization will be a good one. 


The Jacobian and Hessian can now be expressed as a combination of terms 
from the linearized measurements and the nonlinear measurements 

a = a o +A o (0-0o)+^H k T R k - 1 (z k -h k (0)) 

A = Ao + ^H^Rk^Hk (13) 

Once measurements are linearized, there are parts of the state space that will no 
longer be a part of new measurements coming in (like old robot poses). Those 
subspaces can be eliminated from the filter. If we partition the state vector into 
8 = [#i T #2 T ] T and the gradient a = [ai T a 2 T ] T and 



then we can eliminate the first subspace by updating the parameter vector, 
gradient, and Hessian as follows, 

8 t — $ 2 , a i — a 2 

A <— A22 — A21A11 1 A12 (15) 

See [13] for details. The filter continues to incorporate new measurements as 
they become available, and linearizes them at some later time. 

4 Maximally Informative Statistics 

Good heuristics for how and when to linearize measurements can come from 
the notion of maximally informative statistics. A statistic r = t(z ) is some 
function of a data set which may be a reduction such as moment computation 
(mean, variance) , finding the maximum or minimum of the data, or estimation 
of parameters for some parametric model. Typically the goal is to compute a 
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Figure 2: Recursive filtering requires linearization of the nonlinear measurement 
function h(x) at a point. The Jacobian provides the instantaneous slope at a 
point estimate, but the maximally informative statistics principle indicates that 
an interpolation can fit the function better by minimizing expected square error 
under the probability distribution p(x) 


statistic which allows inference to be made without reconsidering the entire data 
set. 

A sufficient statistic is a statistic that can be used to make inferences about 
the data just as effectively as the original data itself. More formally, a statistic 
is sufficient if the distribution for an estimator under the statistic is the same 
as it is under the original data, 

p(6\t) = p(6\z) (16) 

An example of a sufficient statistic is the sample mean and sample variance for 
normally distributed data. 

A maximally informative statistic is a generalization of sufficiency. There is 
not always a sufficient statistic, but we can always find some statistic r G T 
which satisfies 

t = Argminter D(p(0\t)\\p(0\z)) (17) 

where ZJ(-| | -) is the Kullback-Liebler divergence, or relative entropy, and T is 
some class of statistic. Here we will let T be the set of all mean vectors and 
covariance matrices. When a sufficient statistic exists within T, the sufficient 
statistic is the maximally informative one and the KL divergence becomes zero. 

Under the assumption of normally distributed additive observation noise, we 
can write the true posterior for model parameters given data as 

p{6 |z) oc e -(»-M*» T *“ 1 (»-h(e)) ( 18 ) 

and the Gaussian approximation in state space can be written 

p(0 |0o, Co) (X e -(«-«o) T c 0 - 1 («-«o) ( 19 ) 
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If we assume that Co can be computed as Co 1 = H T f? X H then our job 
becomes one of finding the H such that the KL divergence 


D = E P 



( P(0 N A A 

\p{e\6o,C 0 )JJ 


(20) 


is minimized, which after manipulation becomes 


D = (21) 

E p [(h(0) - H(6 - 6 0 )) T R- 1 (h(6) - H (9 - 6> 0 ))] 


where E p denotes expectation under distribution p. Since R is positive definite, 
D is bounded below by zero and is minimized when the linearization H(0 — 9o) 
is most accurate over the region of high probability within p(6 |z). Typically 
the means for computing the linearization H in Extended Kalman filtering is 
to compute the Jacobian[15], which is only accurate for infinitesimal departures 
9 — 9o- Alternatives have been reported elsewhere, the DDF filter [16] replaces 
the Jacobian with a central divided difference, and the Unscented filter [8] uses 
a deterministic sampling scheme to compute the posterior covariance directly. 
Each of these approaches finds a linearization that is more accurate over the 
interval than the Jacobian computation, and performance increases over the 
EKF have been reported for both. In our work we compute a locally weighted 
linear interpolation of the function h k (-) using Gaussian quadrature. 

Gaussian quadrature is a means of numerically computing an integral using 
a small number of carefully chosen points and associated weights[14]. Deter- 
ministic rules for computing the samples and weights exist. There are specific 
rules for computing the samples to use for evaluating expectations as in (21) de- 
pending on the form of the distribution over which the expectation is computed. 
Because the envelope function p(9\z ) above is approximated by a Gaussian, we 
compute the Gauss-Hermite[14] quadrature points yi and associated weights Wi 
for the dimensions corresponding to the inputs to h k Q, namely the robot pose 
and landmark position related to that measurement. 

Once the quadrature points are computed, we fit the linear system 


/ VWh k ( yi ) > 


/ x /wTxi T N 

y/w2h k {y 2 ) 

— 

V^2 x 2 T 

\ VwNh k (y N ) ) 


\ J 


using least squares. The resulting coefficient matrix H k is used to update the 
Hessian matrix A in the VSDF. 


5 Using Cholesky factors 

A reduction in computational complexity can be realized by working with the 
Cholesky factorization of the Hessian matrix rather than the Hessian itself. 
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Figure 3: Ground truth for an example problem with four landmarks and a 
trajectory consisting of 50 robot poses. 


Cholesky factorization is a common means of solving a system of linear equations 
Ax = b when the coefficient matrix A is symmetric and positive definite. The 
cholesky factorization SS T = A is first computed, then the two triangular 
systems 

Sy = b 

S T x = y (23) 

are solved. The Cholesky decomposition of a dense N x N matrix can be 
computed in 0(N 3 ). The solution to the two triangular linear systems is 0(N 2 ). 

If rather than storing and manipulating the full Hessian matrix A we can 
store and manipulate its Cholesky factor S, then the factorization step can be 
avoided and the solution to (8) can be computed with 0(N 2 ) backsubstitu- 
tions alone. The only remaining problem is to determine how to propagate the 
Cholesky factor from step to step in the filter. 

There exist algorithms for performing the update and downdate of a Cholesky 
factor, where update is defined as the addition of a symmetric outer product 
A' = A + v T av and downdate is the subtraction of a symmetric outer product 
A' = A — v T av. For a rank-1 update or downdate these algorithms require 
0(N 2 ) computation, so a rank-fc update can be done in 0(kN 2 ). If we already 
have the Cholesky factor for the prior Hessian A 0 , then the step which com- 
bines the prior and the likelihood is given by (13) which can be computed as 
a Cholesky update, and the marginalization of state dimensions to be removed 
from the filter is given in (15) which can be computed as a Choleksy downdate. 
Since we can perform Cholesky updates and downdates for adding and remov- 
ing measurements and states in the filter, and use the Cholesky factors in the 
optimization step to solve the normal equations, we can do away with the full 
covariance matrix A and only maintain and use its Cholesky decomposition S. 
This technique has been used to modify Kalman filters for parameter estimation 
problems [15]. 


9 





X 


(b) 

Figure 4: Result from one trial on the example problem, (a) Initial state esti- 
mate using measurements from first five robot poses, (b) Final state estimate, 
including map and last five robot poses. This is the estimate after processing 
all information. 

The insertion and removal of measurements and states in the filter proceeds 
as before except that Cholesky updates and downdates replace the operations 
on Ao- 

6 Experimental Results 

Figure 3 shows an example problem with four landmarks and 50 robot poses 
in the trajectory. The small problem size is chosen to make the figures more 
legible. We ran 100 Monte Carlo trials of the filter algorithm by generating 
synthetic data z W for r = 1 . . . 100 using the generative model described in 
the introduction with Gaussian additive noise. For each trial the algorithm was 
used to produce a state estimate once using the Jacobian linearization H = 
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Figure 5: Map reconstruction error over time, ensemble average over 100 trials. 
Solid line shows result of linearization using quadrature based interpolation, 
dash-dot line shows result of linearization using Jacobian. 


and once using quadrature based interpolation. In this example the filter retains 
nonlinear measurements for 5 time steps before linearizing. Figure 4 shows an 
initial and final state estimate for one run of the filter. After filtering each data 
set, the squared error between the true map and the final estimated map was 
computed. 

Figure 5 shows the map reconstruction error over time, compared with 
ground truth and averaged over the 100 runs. Some variation is expected from 
one run to the next, so we cannot expect the quadrature based method to per- 
form strictly better than the Jacobian method on a per-trial basis, but over the 
course of many runs the quadrature method shows much better performance 
in terms of the accuracy of the final estimate. What is interesting to note in 
Figure 5 is that the Jacobian based method seems to converge to a solution with 
smaller reconstruction error early but then diverges. This may be because as 
the filter estimate changes the linearization as computed at old state estimates 
becomes less accurate and effects are not seen until the state estimate moves 
sufficiently far from where it was linearized. 

Figure 6 shows the convergence of the x and y coordinate of the landmark 
that appears in the lower left of Figure 3, which is the most difficult to estimate 
given the problem geometry. The evolution of the estimated position is shown 
for ten runs along with the estimated variance, and convergence to the true 
solution is seen. 


7 Summary and Conclusions 

In this paper we investigate the implication of maximally informative statis- 
tics on linearization for recursive filtering. The maxinfo criterion is shown to 
be equivalent to the expected squared error between the true nonlinear model 
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function and its linearization under the posterior. This metric is physically 
meaningful and very intuitive. It is used to determine a means of linearizing 
the measurements in the VSDF which is shown to outperform the analytic Ja- 
cobian for the problem considered here. The linearization itself is performed 
using Gaussian quadrature. We are investigating using the linearization error 
to decide when to linearize and when to leave measurements in the filter, al- 
though at some point computational resources may require linearization even if 
only a poor approximation is available. 

We have also introduced a square root formulation of the VSDF which cuts 
the computational complexity from 0(N 2 (L + N)) to 0(N(L + N)), where N is 
the number of landmarks in the map and L is the time lag for the VSDF. This is 
a significant for large maps since N could be in the hundreds or thousands. The 
algorithmic improvement only changes the way in which the sufficient statis- 
tics are represented and used in optimization, and does not affect the error 
performance. 
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Figure 6: Convergence of the lower right landmark, which is the one that is 
least certain in early estimates, (a) shows the convergence of the x coordinate 
for landmark #3. (b) shows the convergence of the y coordinate for landmark 
#3. Values for ten runs are plotted as dashed lines. Also plotted are the true 
value and the 95% confidence (3a) region centered on the true value in solid 
lines. 
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